Skip to content

Add disable=True/False flag for Spark autologging#20

Open
mohamad-arabi wants to merge 24 commits intodbczumar:interf_protofrom
mohamad-arabi:disable-flag-spark
Open

Add disable=True/False flag for Spark autologging#20
mohamad-arabi wants to merge 24 commits intodbczumar:interf_protofrom
mohamad-arabi:disable-flag-spark

Conversation

@mohamad-arabi
Copy link
Copy Markdown

What changes are proposed in this pull request?

(Please fill in changes proposed in this fix)

How is this patch tested?

(Details)

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

mohamad-arabi and others added 6 commits December 9, 2020 16:24
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
…ion tests (mlflow#3800)

* skip if the matrix is empty

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* set is_matrix_empty

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix syntax error

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* minor comment fix

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
* Fix for xgboost 1.3.0

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* do not include 1.3.0 since it has been removed

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* re-run all the tests if set_matrix contains changes

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix regexp

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add test case

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Refactor using packaging

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add packaging

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
@mohamad-arabi mohamad-arabi changed the base branch from master to interf_proto December 11, 2020 19:22
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
from tests.spark_autologging.utils import file_path # pylint: disable=unused-import


# Note that the following tests run one-after-the-other and operate on the SAME spark_session
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohamad-arabi This is awesome! Can we also test the case where there isn't a preexisting spark session and we call autolog() with disable=True/False before creating a session?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These just test that no exceptions are thrown. They don’t verify that tags are set or not set depending on the disable flag when a session is created after autolog is called. Let me know if you think we already capture that elsewhere

dbczumar and others added 13 commits December 11, 2020 17:17
…3682)

* Safe

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Keras

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* TF

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fixes

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Some unit tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* More unit tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Test coverage for safe_patch

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Add public API for autologging integration configs

Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Remove big comment

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Conf tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Mark large

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Whitespace

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Blackspace

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Rename

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Simplify, will raise integrations as separate PR

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Remove test_mode_off for now

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Support positional arguments

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Docstring fix

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* use match instead of comparison to str(exc)

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Forward args

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Try  importing mock from unittest?

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fix import mock in statsmodel

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Revert "Fix import mock in statsmodel"

This reverts commit a81e810.

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Support tuple

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Address more comments

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Stop patching log_param

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

Co-authored-by: Mohamad Arabi <mohamad.arabi@databricks.com>
* reject bool metric value

Signed-off-by: Halil Coban <halil.coban@gmail.com>

* add comment on why we check for bool

Signed-off-by: Halil Coban <halil.coban@gmail.com>
* initial commit

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* Updated tests to refelct new type conversions rules and to make sure we include hin message when necessary.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* fix tests.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* lint.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* fix.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* fix.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* fix.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* minor fix

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* lint

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* revert

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* update

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* Update doc.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* fix.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* fix docs.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* add hint/warning to schema inference

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* Addressed review comments.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>

* Addressed review comments.

Signed-off-by: tomasatdatabricks <tomas.nykodym@databricks.com>
…py < 3.0.0 (mlflow#3825)

* Fix AttributeError: 'Dataset' object has no attribute 'value'

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix reimport

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* remove print

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
* Add gluon to cross-version-tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix version

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix metric import

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* newline

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* fix typo & pylint error

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* use load_parameters

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix import

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test_gluon_model_export.py

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Add # pylint: disable=import-error

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix import position

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* nit

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
* Fix invalid metric issue in statsmodels flavor

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Introduce _is_numeric

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
* Add fastai to the cross version tests

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* add sklearn

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
…low#3815)

* Safe

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Keras

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* TF

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fixes

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Some unit tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* More unit tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Test coverage for safe_patch

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Add public API for autologging integration configs

Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Remove big comment

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Conf tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Mark large

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Whitespace

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Blackspace

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Rename

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Simplify, will raise integrations as separate PR

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Remove partial tensorflow

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Updates from utils

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Remove test_mode_off for now

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Support positional arguments

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Docstring fix

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* use match instead of comparison to str(exc)

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Forward args

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fixes from mlflow#3682

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* integration start

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Try  importing mock from unittest?

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fix import mock in statsmodel

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Mock fix

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Revert "Fix import mock in statsmodel"

This reverts commit a81e810.

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Support tuple

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Address more comments

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Stop patching log_param

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Modules

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Another test, enable test mode broadly

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Black

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fix

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Move to fixture

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Docstring

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Use test mode for try_mlflow_log

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Test try_mlflow_log

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Docs

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Assert

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Try log keras

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Review comment, add init for tests

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Actually commit the fixtures file...

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Test fixes, lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fix, format

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Fix fast.ai

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lintfix

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Docstrings

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Address nit

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

* Lint

Signed-off-by: Corey Zumar <corey.zumar@databricks.com>

Co-authored-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Signed-off-by: Mohamad Arabi <mohamad.arabi@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants